3-6: Main Pipeline Function
Since steps 3-6 are the same for either pre-selected scans or if pull_scan_numbers() is used, we do have an “isoforma_pipeline” function to run them all together.
IsoForma is a package for quantifying positional isomers (QPI) in MS2 spectra data. Currently, analysis of this type of data requires the use of several separate tools which is inconvenient and time-consuming. This goal of this software is to offer all the functionality needed for this analysis in a streamlined package.
Much of the backend functionality is drawn from the pspecterlib package, including generating metadata objects. More information about the backend package can be found here.
IsoForma was built to ingest two main types of data: 1) an MS file (XML-based or ThermoFisher raw) or 2) a list of peak_data objects that can be generated with pspecterlib. If an MS file is provided, automatic MS2 peak detection options are provided. Otherwise, the provided peak data is simply summed together.
Here are the general steps of the IsoForma algorithm and their respective functions:
Select scan numbers: Either manually or with pull_scan_numbers()
Sum peaks: sum_ms2_spectra()
Match experimental and literature fragments for every proteoform: fragments_per_ptm()
Sum isotopes and charge states per fragment per proteoform: sum_isotopes()
Calculate an abundance matrix: abundance_matrix()
Calculate proteoform relative proportions: calculate_proportions()
Steps 3-6 can be run all together with our main pipeline function.
To select scan numbers, either use the pull_scan_numbers() function to automatically detect and suggest MS2 peaks, or select them yourself and make pspecter peak_data objects out of them. See ?pspecterlib::make_peak_data or ?pspecterlib::get_peak_data.
# Make a list of pspecterlib peak_data objects
PeakDataList <- list(
readRDS(system.file("extdata", "PeakData_1to1to1_1.RDS", package = "isoforma")),
readRDS(system.file("extdata", "PeakData_1to1to1_2.RDS", package = "isoforma")),
readRDS(system.file("extdata", "PeakData_1to1to1_3.RDS", package = "isoforma"))
)
head(PeakDataList[[1]]) %>% knitr::kable()
| M/Z | Intensity | Abundance |
|---|---|---|
| 151.5681 | 483.0363 | 0.0313 |
| 151.5682 | 930.2599 | 0.0603 |
| 151.5683 | 1144.0471 | 0.0742 |
| 151.5684 | 1003.7305 | 0.0651 |
| 151.5686 | 631.3395 | 0.0409 |
| 151.5922 | 461.2453 | 0.0299 |
The peak summing function will either take a scan_metadata object from pspecterlib and the selected scan_numbers from pull_scan_numbers() and sum the results, or use a list of pspecterlib peak_data objects. This function will return one summed peak_data object.
# Sum selected peaks together
PeaksSum <- sum_ms2_spectra(
PeakDataList = PeakDataList,
PPMRound = 5,
MinimumAbundance = 0.01
)
head(PeaksSum) %>% knitr::kable()
| M/Z | Intensity | Abundance |
|---|---|---|
| 150.2730 | 693.4443 | 0.0153 |
| 150.2735 | 3721.4915 | 0.0820 |
| 150.9515 | 1706.9498 | 0.0376 |
| 150.9520 | 3536.4570 | 0.0779 |
| 150.9740 | 3425.3976 | 0.0754 |
| 150.9745 | 670.2090 | 0.0148 |
To generate all proteoforms to test, use the pspecterlib::multiple_modifications function. Then, pass that list of sequences to the fragments_per_ptm function. If the isotoping algorithm crashes, considering switching the to IsotopeAlgorithm = “isopat”. This function will return a list of matched_peak objects from pspecterlib.
# Generate a list of PTMs to test
MultipleMods <- pspecterlib::multiple_modifications(
Sequence = "LQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG",
Modification = "6.018427,V(17,26,70)[1]",
ReturnUnmodified = TRUE
)
# Calculate fragments per proteform
AllFragments <- fragments_per_ptm(
Sequences = MultipleMods,
SummedSpectra = PeaksSum,
PrecursorCharge = 11,
ActivationMethod = "ETD",
CorrelationScore = 0, # Here, we don't care about correlation score filtering
Messages = FALSE
)
head(AllFragments[[2]]) %>% knitr::kable()
| PPM Error | Ion | Z | Isotope | M/Z | M/Z Experimental | M/Z Tolerance | Isotopic Percentage | Intensity Experimental | Correlation Score | Type | General Type | Modifications | Molecular Formula | Position | N Position | Residue | Sequence |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -9.7607150 | c12 | 6 | M | 225.4787 | 225.4765 | 0.0022548 | 100.00000 | 684.4547 | NA | c | c | C63H109N15O17 | 12 | 12 | T12 | LQIFVKTLTGKT | |
| 0.1350431 | c2 | 1 | M | 259.1765 | 259.1765 | 0.0025918 | 100.00000 | 2941532.0312 | NA | c | c | C11H21N3O4 | 2 | 2 | Q2 | LQ | |
| -8.1864786 | c2 | 1 | M+1 | 260.1851 | 260.1830 | 0.0026019 | 13.80402 | 2555.8947 | NA | c | c | C11H21N3O4 | 2 | 2 | Q2 | LQ | |
| 0.3082109 | z23 | 9 | M | 295.6129 | 295.6130 | 0.0029561 | 68.07372 | 683.7947 | NA | z | z | C116H197N37O35 | 23 | 54 | R54 | RTLSDYNIQKESTLHLVLRLRGG | |
| 6.1756975 | c21 | 7 | M | 334.7684 | 334.7705 | 0.0033477 | 76.52361 | 1072.7388 | NA | c | c | 6.018427=6.018427@V17 | C106H178N24O34 | 21 | 21 | D21 | LQIFVKTLTGKTITLEVEPSD |
| -0.0671573 | c3 | 1 | M | 372.2605 | 372.2605 | 0.0037226 | 100.00000 | 956499.5312 | NA | c | c | C17H32N4O5 | 3 | 3 | I3 | LQI |
This function will return a table of summed intensities per fragment.
IsotopesSum <- sum_isotopes(IsoformaFragments = AllFragments)
head(IsotopesSum) %>% knitr::kable()
| Ion | Summed Intensity | Proteoform |
|---|---|---|
| c10 | 3471537.65 | UnmodifiedSequence |
| c11 | 1424215.52 | UnmodifiedSequence |
| c12 | 11049.11 | UnmodifiedSequence |
| c13 | 128408.18 | UnmodifiedSequence |
| c14 | 573659.49 | UnmodifiedSequence |
| c15 | 182678.46 | UnmodifiedSequence |
This function will return an abundance matrix for a selected ion, where each row is a fragment and each column is a proteoform. The values are summed intensities.
# Select your ion group of choice when calculating the abundance matrix
AbunMat <- abundance_matrix(
SummedIsotopes = IsotopesSum,
IonGroup = "c"
)
head(AbunMat) %>% knitr::kable()
| Ion | 6.018427@V17 | 6.018427@V26 | 6.018427@V70 |
|---|---|---|---|
| c2 | 2944087.9 | 2944087.9 | 2944087.9 |
| c3 | 957655.9 | 957655.9 | 957655.9 |
| c4 | 665981.4 | 665981.4 | 665981.4 |
| c5 | 635080.4 | 635080.4 | 635080.4 |
| c6 | 405053.6 | 405053.6 | 405053.6 |
| c7 | 1050680.0 | 1050680.0 | 1050680.0 |
This function returns both a table and a plot.
Proportions <- calculate_proportions(AbundanceMatrix = AbunMat)
## Profiling...
## Profiling...
Proportions[[1]] %>% knitr::kable()
| Modification | Proportion | LowerCI | UpperCI |
|---|---|---|---|
| 6.018427@V17 | 0.2341259 | 0.1769313 | 0.3078882 |
| 6.018427@V26 | 0.3299839 | 0.2888934 | 0.3699363 |
| 6.018427@V70 | 0.4358902 | 0.3959378 | 0.4769807 |
Proportions[[2]]
Since steps 3-6 are the same for either pre-selected scans or if pull_scan_numbers() is used, we do have an “isoforma_pipeline” function to run them all together.
Visualize multiple PTM fragment identifications over one plot in a large, interactive plotly display.
annotated_spectrum_ptms_plot(
SummedSpectra = PeaksSum,
IsoformaFragments = AllFragments
)